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In presenting material to be learned in a film, is a single, long session, dealing 
with the subject in depth, as effective as the same content divided into several short 
sessions 7 In other words, is a long presentation more tiring than a short one 7 
Groups of psychology students and Navy recruits were given equivalent amounts of 
instruction time, but according to different protocols — either massed presentation or 
spaced presentation. For each of the four film series used, the learning was very 
significant, but the difference between the massed and spaced presentations, as 
measured by total scores on the film tests, were no greater than could be accounted 
for by chance alone. Furthermore, the experimental subjects stated that one mode of 
presentation was not more effective in maintaining interest than the other. The 
conclusion drawn is that military training films, presently constituting a twenty-minute 
aid to lecturers, may be lengthened to an hour and become a more central form of 
instruction. (BB) ‘ 
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••• FOREWORD 



This is a report of the results of an experiment on the 
problem of the relative effectiveness of massed versus spaced 
training-film presentations . The research has a direct bear- 
ing on the question of how long training sessions of motion 
picture films should be . The report relates indirectly to 
the question of optimal length of training films. The results 
have practical implications for the scheduling and use of in- 
structional and informational films. 

The Task Order under which the Instructional Film Re- 
search Program has been operating requires that attempts be 
made to establish the scientific principles which should 
govern both the production and utilization of films for the 
purpose of rapid, effective mass training. The research. re- 
sults given in this report by Dr. Philip Ash relate to a few 
aspects of the problems of effective utilization of motion 
picture films. 

It is clear to those who are familiar with the field 
that the problems are complex and difficult. Nevertheless, 
it is believed that Dr. Ash has made a significant contribu- 
tion. Not only have different classes of populations been 
tested,, but also a variety of films has been used. The re- 
sults point consistently to the main conclusions. 

This final technical report is somewhat condensed from 
the basic thesis, which was presented during June 19^-9 In 
partial fulfillment of the requirements for the Doctor of 
Philosophy Degree in Psychology. This thesis, with all 
tables, tests, and schedules, has been microfilmed^ copies 
can be made available to Individuals who wish to study the 
full thesis report. 



C. R. CARPENTER, Director 
Instructional Film Research Program 
The Pennsylvania State College 
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SUMMARY 



Statement of the Problem 



The central experimental problem studied in this report 
may be stated as follows: Given a body of information that 

is to be presented by means' " of films , do people learn more 
if they are presented with this content in one long film in 
a single session, or if they are pre sented with the content 
broken up into several short units in two or more sessions ? 

A variety of secondary questions was studied in con- 
nection with this central one. Of these, the principal one 
was: Is there a diminution in interest as film length is 

increased? If so, what relationship is there between learn - 
ing and interest measured as a function of film length ? 



Experimental Procedures 



The study involved two independent experiments, one 
with 11 classes of undergraduate psychology students, the 
other with 10 companies of Navy recruits. 

Psychology classes * experiment . Two film series were 
used. Each series included four 15-minute, silent, black 
and white reels. The first series presented a comparative 
study of maturation and learning in a human infant and a 
chimpanzee. The second series dealt with the induction of 
experimental neuroses in cats, and showed methods of cur- 
ing the neuroses. 

Three classes were shown each series in a single hour- 
long session. Two classes were shown each series in two 
sessions, two reels per session. The sessions, lasting 30 
minutes each, were on alternate days. Two classes were 
shown each series in four sessions, one reel per session. 
These sessions, lasting 15 minutes each, were on alternate 
days . 



Immediately at the end of each film session, each 
group was asked to fill out an interest rating form. 

One or two weeks after the film showings for each 
series, the experimental classes were tested on the film 
content. At the same time, four classes, which served as 
control groups, were given the film tests without having 
seen the films. 



Navy recruits* experiment. Two films series were used, 
each comprising three 15-minute sound reels. One series dealt 
with "rules of the nautical road.” The other concerned prin- 
ciples of elementary hydraulics. 

* 

Five companies of recruits were assigned to each series. 
For each series, two companies saw the three reels in a" single 
45-minute session. Two companies saw the three reels in three 
15-minute sessions, one reel per session. One company took 
the test without seeing the films. The companies were tested 
one week after the experimental film showings. Interest rat- 
ings were made at the end of each film session. 



Combined Results and Discussion 



Differences in the presentation methods . For each of the 
four film series ,~!The differences between the massed and spaced 
presentations methods, as measured by the total scores on the 
film tests, were no greater than could be accounted for by 
chance alone. When the scores for the sub-tests for each reel 
within each film series were analyzed, in general the same re- 
sults were found. However, for each series, the difference 
between the control and experimental groups was large and high- 
ly significant. 

The interest ratings . In general, variation among the 
groups during any session was as great as, or greater than, 
variation between methods. Analysis of the distribution of 
responses to the individual questions on the interest rating 
forms failed to show any consistent differences among the pre- 
sentation methods in student or trainee interest. Finally, the 
correlations between the interest ratings and the film test 
scores were about zero. 



Conclusions 



The principal conclusions are: 

1. Training sessions using films may last as long as an 
hour and still result in significant learning • Long massed 
film sessions have not been shown to be significantly less ef- 
fective than short spaced sessions. 

2. Within the time limits employed in this study, sub- 
jects do not seem to find long film sessions less interesting 
than short spaced sessions, and the learning accomplished seems 
to be relatively independent of expressed interest. 
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Re c ommendations 



From this experiment arise the following practical re- 
c ommendat ions : 

1 . in mass training programs, the scheduling of long 
film sessions for training purposes should be explored as a 
means of economizing training time, simplifying scheduling, 
and utilizing instructors more efficiently. 

2. Producers should consider the possible advantages 
of making single long films, where the material calls for 
extended treatment, rather than series of short units. 

3 . Further research is needed to determine what the 
limits are in lengthening film sessions, what kinds of sub- 
ject matter can be taught in this concentrated manner, and 
to what sorts of people. 



THE RELATIVE EFFECTIVE NESS 

of masseFyersus spaced film presentation 

I . INTRODUCTION 



Background of the Problem 



C urrent ed uc at ional pra ctice wi th^ rgspen-tL ..th.e_ use _of 

the ins t rue t ionaX'fTim is based largely upon the premise that 
the film is an aid to the teacher, rather than an exclusive 
means of instruction. The school day is divided into periods 
of 40 or 50 minutes each; if there Is to be time for the class 
preparation, the follow-up discussions, and the activities sug- 
gested by the film, an instructional film cannot run more than 
15 or 20 minutes (13, p. 17} • On the basis of this rationale 
alone, therefore, there would be scant need to inquire into the 
possibility of efficient learning from longer instructional 
film units . 

In large mass training programs, however, such as those 
utilized by war industry training organizations and the Armed 
Forces, there often arises the question of how long instruc- 
tional film sessions can last and still be effective (j53}» 

In these training programs, lack of Instructors and lack of 
time frequently force an elimination bf everything but the 
core content as embodied in the film. The film may be re- 
quired to do all of the teaching and the instructor may be 
replaced by a projectionist. Furthermore it may be that In 
certain educational situations, particularly at the higher 
levels, the same use may be made of films. 

However, limiting the instructional film to 10 or 15 
minutes is not a result of the schedule of the school day 
alone. Cal<> (4), McKown (21), Doane (10), Bernard (1}, and 
others have suggested that film sessions must be kept short 
for a variety of other reasons, such as that in long sessions 
the learners become sleepy and bored, their attention wanders, 
or the learners may acquire harmful mental habits and be sub- 
jected to “hygienic disadvantages.” On the other hand, some 
have maintained that film sessions may run for several hours 
before serious adverse effects are noted. The convenience of 
scheduling long sessions is held to offset any slight dis- 
advantages or losses that might be obtained* 

It is interesting to note that although Doane ( 10) is 
among those who list as one desirable characteristic of in- 
structional films a limit of one reel in length, he points 
out that the criticism that instructional films are generally 
too long is not based on any experimental finding. Further- 
more, student evaluation of current educational films presents 



evidence which is directly contrary to the criticism. Dur- 
ing the course of the Motion Picture Project of the American 
Council on Education, 12,000 student ratings were collected 
for a sample of 500 films. The most frequently mentioned 
suggestion which the students made for improving both sound ^ 
and silent films was that such films be made longer. Twenty- 
six per cent of the students reported that existing films were 

too short (13* p. 144). v 

* 

Current learning theory and experiments in learning fail 
to provide any definitive answer to tue question of how long 
film sessions may last and still be effective . Little rele- 
vant work has been done with highly complex materials in in- 
structional films. 

A survey of the practices of instructional film producers, 
however, suggests that the producers have reached a practical 
solution satisfactory to themselves. Although there are ex- 
ceptions, the typical commercially produced film is tailored 
to fit the standard 400-foot reel which runs for Just over 10 
minutes . 

Furthermore, although the Armed Forces were not bound by 
educational practice and although they frequently used films 
with little or no instructor embellishment, the Services pro- 
duced or had produced films which closely approximated these 
limits. The writer compiled a distribution by length in run- 
ning time of 1131 Army films and 882 Navy films. The results 
are reported in Table 1. For each service the mean running 
time was between 18 and 19 minutes. For each service 56 Per 
cent of the films produced ran for 18 minutes or less. Bor 
89 per cent of the films the running time was less than 30 
minutes . 

One important consequence of the emphasis on short films, 
from the point of view of film making, has been the produc- 
tion of series of films, each film a reel long and each cover- 
ing part of an instructional unit. This practice has been 
followed by the Armed Forces, the Office of Education, and 
various commercial producers. At their option, instructors 
may therefore present each small segment separately, or show 
all or several of them at a time. 

From a practical point of view, then, the question of 
the relative effectiveness of "long” versus "short” films 
is of considerable interest. It has a bearing, in the oper- 
ating training program, on the economics of scheduling and 
bringing groups of people together; from the production 
point of view it has relevance to planning the length 01 iiJ.ms. 

It is suggested, however, that an issue more basic 
than convenience of scheduling or production is involved, 
namely: Are motion pictures intrinsically different in 
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TABLE 1 

FREQUENCY DISTRIBUTION: 

RUNNING TIME OP ARMY AND NAVY TRAINING FILMS 



Running Time 


Army Films 
Number Per cent 


Navy Films 
Number Per cent 


1-6 minutes 


41 


3.63 


48 


5.45 

• 


7 - 12 minutes 


299 


26.44 


192 


21.77 


13-18 minutes 


294 


26.00 


270 


30.61 


19 - 24 minutes 


224 


19.80 


200 


22.68 


25 - 30 minutes 


144 


12.73 


74 


8.41 


31 - 38 minutes 


57 


5.03 


53 


6.01 


37 - k2 minutes 


34 


3.00 


23 


2.61 


43 - 90 minutes 


38 


3.37 


22 


2.46 


Total 


1131 


100.00 


882 


100.00 


Mean running time 


18.99 


minutes 




18.17 minutes 


Standard deviation 


IO.69 minutes 




8.94 minutes 
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their teaching characteristics from lectures or °ther in 

structional methods? The raising of educational 

length points to this issue, for practically no e juc; iZt™- 

theorist has proposed that, lest the capaci y o minutes 

ers he exceeded, class room lectures be reduced to 
Even in the most "non-participating situation - * 

example, in some of our larger schools and col J®&® s W *V 
lectures are delivered over public address systems -no one 
seemHo have contemplated limiting the instructional perxod 
to much less than an hour. 

Current teaching practice, of course, does no £ 
tute adequate evidence for the solution of the P**o 
determining the optimum length of training film sessions. 

It may be that films and teachers have quite differ.nt ex 
fects on learners ; or that teaching practice is itself at 
fault; or finally, that teaching practice is essentially 
correct and learners can be safely exposed to films for 
periods as long as lectures . 

Statement, of the Experimental Problem 









The experimental problem posed for investigation in 
this study may be stated as follows. 

When instructional films are employed as an exclusive 
means of teaching (i.e., without instructors,^ 
paration, or follow-up discussion), what is the relative 
effect on measured learning of presenting a standard one- 
hour film unit in each of the following ways? 

1. In one one-hour period (massed presentation method). 

2. In two or more equally spaced periods, each lasting 
a fraction of the hour and including a suitable sub-unit or 
the hour teaching unit (spaced presentation method;. 

Secondary questions, some of which emerged as a result 
of the conduct of the experiment or were suggested by the 
data, include: 

1. What is the relationship between the amount of 
learning and the subjects* interest in the film series. 

2. What is the effect of the possession of previous 
knowledge on learning from massed, as opposed to spaced, 
presentation? 

3. To what extent do relative differences, if any, 
persist or change as the retention period is increased? 
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It should he pointed out that, although this experiment 
Is limited to the situation in which the film is the sole medium 
of teaching, there does not seem to he a valid reason for be lie v 
ing that the relative efficiency of the presentation methods 
would he substantially changed if they were used in conjunction 
with a more conventional technique of film utilization. 



Review of Related Research 



Except for a single short report of a small study made 
during World War II by the Morale Services Division of the Army 
Service Forces, War Department (33)* a search of the literature 
fails to reveal any investigations directly hearing upon the 
problem which is the subject of this experiment. 

However, three lines of study contingent to the problem of 
the optimum length of training film sessions may be noted. 

These include: 

1. Research on the effectiveness of films in comparison 
with other training media. 

2. Investigations of part-whole learning and massed-dis- 
tributed practice. 

3. Studies dealing with the length of class periods, 
particularly in the secondary school and college. 

These areas will be reviewed briefly first, and then the 
Army study will be discussed. 

Learning from films . Since in the present experiment 
motion pictures are used as " total teaching procedures 1 ' with- 
out the aid of instructors and without prior preparation or 
subsequent discussion, it is pertinent to inquire whether in 
fact this is a realistic, if not an entirely usual, procedure. 

As VanderMeer ( 34) has pointed out, while it has rarely been 
proposed that films could carry the whole burden of instruction, 
"Nevertheless, under conditions which may be specified during 
national emergencies when rapid mass training is required .... it 
may be necessary to utilize. . . .Jfilms]. . . .as a relatively exclu- 
sive means of instruction." The question is therefore relevant 
not only to the experimental design of this particular investi- 
gation, but also to the probability of occurrence of teaching 
situations for which the findings of this investigation might 
have some degree of pertinence. 

Fortunately, in spite of the often-expressed insistence 
upon the primary role of the instructional film as an aid to 
the classroom lecture, the motion picture research literature 
is replete with instances in which the effectiveness of films 







is explicitly compared with the effectiveness of, among other 
methods of instruction, the classroom lecture. Of the six 
studies reported by Devereux (2^’ PP* 61-100), one by C. C. 
Clark, of New York University, covered precisely this point. 
Three equated groups of college students were taught by a* 
series of sound films a series of silent films and a series 
of classroom lecture demonstrations, respectively. The gener- 
al conclusion that seemed warranted by the data was that there 
was no significant difference in the efficiency of the three 
methods, as measured by subject matter tests. 

In a review of the literature, Hoban (8, pp. 33^— 361 ) 
cited a wide variety of studies in which films were compared 
with lectures and classroom demonstrations, either inciden- 
tally or as the central problem of the study. In none of 
these studies was there evidence to indicate that films were 
significantly inferior, and occasionally they were found to 
be significantly better than lectures or demonstrations . 

Among the more recent investigators, Jayne (l^£.) compared 
the factual learning from lectures of one group of freshman 
students with the learning from silent films of another group. 
The subject was general science. Jayne found that although 
the immediate gains from the lectures were higher than those 
from the films, these differences became less with the pas- 
sage of time. Philpott (27) compared the performances of 
five groups, taught by film only, film plus commentary, slides 
only, slides plus commentary, and oral lesson, respectively. 

He found very small differences from the ’’film only” method. 
Hall and Cushing (12) employed three methods - lecture, read- 
ing assignment, and films - and concluded that the learning 
effected was a function of the material taught and of the 
learner. None of the three methods was consistently best or 
worst. Finally, VanderMeer (34) has reported what appears to 
be the most extensive investigation on this point. Three 
hundred ninth grade public school students were taught a 
semester course in general science by one of three methods; 
exclusive film instruction, film plus prepared study guides, 
and "typical instructional methods," respectively. The con- 
trol, or "typical instructional methods," group was taught by 
an instructor using text books, demonstrations, lectures, and 
oral questions and answers. The "film only" group saw the 
films without discussion, teacher comment, or assigned read- 
ing. The "film plus study guide" group saw the films and 
was given mimeographed study outlines for the films, but was 
given no other instruction. The study guides were not dis- 
cussed in class. Analysis of the data suggested that the 
three methods were about equally effective in teaching the 
subject matter, as measured by factual learning. 

It may be concluded, therefore, that (at least for the 
imparting of factual knowledge) sole dependence on films as 
teachers is neither impossible nor unrealistic. The evidence 
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suggests that in mass training programs such a procedure may 
he both practicable and effective. It would be necessary, of 
course, to explore its limits in terms of the kinds of subject 
matter to which it might be applied, and the kinds of learning 
it might bring about or fail to bring about. 

Classical learning experiments . Two concepts current in 
learning theory seem to be relevant to the question of whether 
there is any significant difference in the effect on retention 
of presenting a body of material in films massed as a whole in 
one session, or distributed in parts over several sessions. 
These are (l) the concept of massed versus distributed practice 
and (2) the concept of whole versus part learning . In point of 
fact, the present experimental situation is not subsumed read- 
ily under either of these concepts, and it is difficult to say 
which is the more pertinent. This experimental situation may 
be described as one in which presentation of the whole once in 
a single (massed) session is compared with presentation of the 
parts in several (spaced) sessions. However, it seems worth- 
while to review at least the major findings and to try to 
relate them to the present design. 

Massed versus distributed practice . The practice requir- 
ed to learn a task may be continuous, without rest intervals, 
or it may be distributed with rest intervals interpolated at 
a number of points. The relative advantages of these two pro- 
cedures have been studied in a wide variety of experiments, 
which have been thoroughly reviewed ( 20 , pp. 119 - 151 ; 37, pp« 
211-216) » McGeoch (20, p. 119) states that "the generaliza- 
tion that some form of positive distribution yields faster 
learning than does massed practice holds over so wide a range 
of conditions that it stands as one of our most general con- 
clusions.” 

The large bulk of the experiments, however, deal with 
rote-memory tasks calling for the learning of nonsense sylla- 
bles, codes, word lists, or poetry ; or with perceptual -motor 
tasks such as typing,, mirror-drawing, mirror reading, or 
archery. With respect to complex meaningful materials the 
evidence is not as clear-cut. T. W. Cook (in McGeogh^ 20, 
p. 126), for example, predicted and found that puzzle solu- 
tion was favored by massed practice. On the other hand, 

Austin (in McGeoch, 20, p. 129) found that, while there were 
no significant immediate differences in retention of prose 
selections whether they were studied in five single ^readings 
spaced at intervals of one or two days, or studied in one 
session of five readings, when the recall intervals were ex- 
tended to two weeks and a month the balance of superiority 
shifted to the distributed readings. Gordon and Clark (in 
McGeoch, 20, p. 129) both found the same effect for spaced 
readings of meaningful material. 



While positive distribution or spaced practice has 
generally yielded faster learning, however, the experi- 
ments have not been extended to material of the complexity 
found in motion pictures such as the present experimental 
film series. Furthermore, and perhaps more important, it 
is doubtful whether the procedure employed in this experi- 
ment is sufficiently similar to that employed in investi- 
gations of massed versus spaced practice to permit mean- 
ingful application of the results. 

Whole versus part learning . In the typical part-whole 
experiment, a comparison is made between learning the 
material repeated as a whole until some criterion of ef- 
f Iciency is reached, and learning the material divided into 
two or more parts, each part being repeated until a speci- 
fied criterion of efficiency is reached. The whole and 
the “parts" have usually been defined on a quantitative 
basis, e.g., as stanzas of a poem. The relevant literature 
is reviewed by McGeoch ( 17 * 18 * 19 * 20) and Woodworth (37* 
pp. 216-223). Three observations seem pertinent. First, 
the way in which the concept has been defined and measured 
has required that the tasks used be simple enough to per- 
mit establishing a relatively unequivocal final performance 
criterion. Therefore, the learning of a complex content, 
which can usually be measured only by a test that samples 
items from a wide area, has not been explored. Second, 
the studies have yielded divergent results * many of which 
were statistically reliable. Conclusions seem to be limit- 
ed, therefore, to particular tasks, with learning efficiency 
measured in particular ways, qualified by the assurance that 
the subjects were similar, and so forth. Third, in view of 
the foregoing it seems unlikely that this area of research 
provides any guide to Judging whether there is a significant 
difference in the effectiveness of a single presentation of 
a whole film as compared with the presentation of the film 
in parts spaced over several sessions. 

In short, although the two learning concepts referred 
to have at least a nominal similarity to the experimental 
design, neither has been used in a setting similar to the 
present one . Furthermore , it may be the case that where 
subjects are required to apprehend relationships, to grasp 
concepts and generalizations, and to be able to recognize 
rather than recite by rote, the traditional "distribution 
of practice" and "part-whole" concepts are not applicable. 

Classroom practice . It has been suggested that the 
question of how long films sessions can last and still be 
effective is akin to the question of how long classes can 
last and still be effective. At least until a reasonable 
amount of evidence is available to Justify a distinction 1 
between learning from films and learning from lectures, it- 
would seem pertinent to examine both the experience and the 
research bearing on the question of the length of class 
periods . 
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Two rather different problems have been investigated in 
connection with. the length of class periods. On the one hand, 
some workers have studied the relative advantages of different 
amounts of total teaching time during a semester. In these 
studies, either the frequency of class periods or the length 
of class periods was varied. On the other hand, some have in- 
vestigated the effect of changing the length of the period 
while total teaching time was held constant . 

Those studies in which total time was held constant, while 
unit time was varied, are relevant to the present inquiry. 

Most of the literature on this point is discursive, or 
descriptive of particular programs, rather than experimental. 

For example, Clevenger ( 6 ) and McMillin (22) advocated that . 
class periods be lengthened (generally from 40 minutes to 60 
minutes) because "longer periods save money.” Clevenger con- 
cluded that no one knew the "best length.” Nord (25) compared 
the eight-period day with the six-period day, and concluded 
that the two are about equal. Greenley (ll> advocated 90- 
minute periods; he reported that, out of ¥£4 high school students 
in his school, 38 selected single (45-minute) periods while the 
remainder selected double (90-minute) periods. Manheimer (23) 
suggested that high schools reorganize on the basis of longer 
periods; he pointed out that summer-sessions experience with 
two-hour periods was eminently satisfactory. 

Bruns (3} described the experience of one high school in 
which a shift was made from a schedule of five one-hour periods 
per day to a three period day, with two 90-minute periods in the 
morning and one two-hour period in the afternoon. He reported 
no experimental findings, but claimed that the students and 
teachers preferred the longer periods. Kambly (15) compared a 
one-hour, two semester, course in biology with a two-hour, one- . 
semester course; he found ”no differences.” 

Finally, Stewart (30) reported what appears to be the most 
extensive investigation in this area* He compared the relative 
effect of lengthening class periods and increasing total time, 
and also the effect of lengthening class periods with total time 
held constant. It is the latter part of the experiment which is 
pertinent here. One hundred and eighty tenth year high school 
pupils were divided into two equated groups of 90 each. All 
pupils studied four subjects during a school semester of 12 weeks. 
In the "regular” group the students carried the four subjects 
concurrently, for periods of 40 minutes daily for the 12 weeks. 

The "concentration group" was divided into subgroups to control 
subject order effects; each subgroup carried two of the subjects 
during the first six weeks in cl^ss periods of 80 minutes daily, 
and the other two subjects during the second six weeks in periods 
also of 80 minutes daily. Thus, while total class time was held 
constant the "concentration group" was taught each subject in a 
time span half as long as that required for the "regular group." 



At the end of the experimental period, standardized 
achievement tests in the four subjects were administered « 

In every subject and on every test the concentration group 8 s 
performance was better than the regular group 8 s performance, 
and for eight of the 12 tests the ratio of the difference in 
means to the probable error of the difference was above 4.0 
(30, p. 27). 

This experiment, therefore, as well as the more or less 
adequately documented opinions of educators, suggests that 
efficient learning may be anticipated, at least in the class- 
room, from a highly concentrated presentation of the subject 
in periods lasting substantially longer than one hour. The 
evidence suggests, in fact, that concentrated attention to a 
few subjects is better than more dispersed attention to 
several . • 

/ 

The Army study. The specific question of the optimum 
lengtTT^of film sessions has been investigated, as far as is 
known, in cEIy one previous study. This was an investigation 
made by the Research Branch, Morale Services Division, Army 
Services Forces (33) . 

Following is a description of the experimental design 
employed and conclusions reached: 

"Two standard training films on First Aid (TF 8-33 and 
TF 8-150),' selected for the experiment because of being 
approximately equal in length and difficulty, were shown 
to two groups of men. The first group of 350 men were 
shown both films consecutively in a session lasting abopt 
an hour. (This group will be referred to as the Long 
Session Group.) A second group of 250 men from the same 
IRTC (the Short Sessions Group ) were carefully matched 
against the first group with respect to intelligence, 
education and other relevant factors. These men were 
shown TF 8-33 at a half-hour session in the morning. In 
the afternoon they were -shown TF 8-150 at another half- 
hour session. 

"No difference was found between the Long Session and 
Short Sessions group in the average percentage of new 
material imparted by the film which was shown first . 
However, a significant difference was found between the 
groups in the amount of new material learned from the 
film which was shown last . 

"It was found upon further study that almost all of the 
differences between the two groups were accounted for by 
the slower learners within the two groups - ! (For the pur- _ 
pose of this analysis, all the men were divided into rapi_d 
learners- -AGCT classes I and II — and slow learners — AGCT 
classes III and IV.)” 
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It was reported that, for the second film, rapid learners 
in the long session learned 45 per cent new material. Slow 
learners in the long session, on the other hand, learned only 
27 per cent new material. In the short sessions rapid learners 
learned 46 per cent new material and slow learners learned 35 
per cent. 

Although the results of this study seem to suggest that 
spaced film sessions have the advantage, certain questions of 
design and data analysis tend to weaken this conclusion. 

In the first place, it is at least not clear from the 
statement of the design that the retention test was fairly spaced 
from the experiment 1 sessions. Let us suppose that the long. 
session group was shown both films in the morning, the shor t ses- 
sions group was shown one in the morning and one in the afternoon, 
and both groups were tested the following morning. This arrange- 
ment seems not unlikely on the basis of the description available. 
In this case, the short sessions group would have enjoyed about 
a four-hour retention advantage for the second film. This is a 
sixth of the 24 hour span, and could well have a significant ef- 
fect on perf ormance . For such short intervals the slope of the 
retention curve tends to be still quite steep (e.g., cf. 3£* P* 
53). 

In the second place, the results are expressed only in terms 
of per cent indices . Per cent indices are at best "uncertain 
statistics. They do not necessarily reflect the absolute magni- 
tude of the original scores. In general, regardless of the 
statistical significance of a difference in mean scores, these 
indices will tend to yield larger percentage differences at one 
end of the performance scale, and smaller differences at the 
other end . 1 

In view, therefore, both of the questionable character of 
the experimental design and the ambiguity of the statistics, 
it is doubtful that much confidence can be placed in the find- 
ings reported. 

This survey of literature dealing with problems contingent 
to the present one suggests that research on these problems has 
at most provided very general and somewhat ambiguous conclusions. 
It has been demonstrated that films can be used effectively as 
total teaching' devices. There is a considerable body of opinion 
and some experimental evidence to indicate that a small number 
of long class sessions led by instructors are, if anything, 
better than a large number of short sessions. One small study 
suggests, without arousing conviction, that long film sessions 
are not as effective as short spaced sessions for slow learners. 

1 See microfilm of the original dissertation: footnote 1, page 

21. That footnote embodies a statistical critique of the per 
cent indices used. 
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General Experimental Design 



The research to be reported consists of J™? t 3 ^gasibleto 

deal^ith^hes^experiment^simultaneously^^Thereforej^it^may 

to bothf ^d e to°phint e out h the a Issantial differences' between them. 

Tn fhp first experiment* two series of four 15—minute 
films were shown, a month apart, ^^^ ftrodncto^ psychology 
classes at The Pennsylvania State College. Three metnoas ox 

presentation were employed: a massed film^ 

a sinKle session lasting one hour and including all four films, 
a spaced presentation method using two sessions lasting 3 . 

mimtes elch and including two films per Rm^nutlseach 

presentation method using four sessions lasting lb minutes each 

and including one film per session. 

Tn f.he second experiment, two series of thi’ee films were 
shown to Navy apprentice seamen at the the & 

llrTel’ vier^shown® concurrent ly , to difi fere ^ s |^ e ®f n ®|-’ 

^n ffl mS in- 

cluding all three films, and a spaced presentation methodising 
three 15-minute sessions and including one P 

In both experiments the subject*: ^ a ^® d ^ e ®5 P ® ^ ^S^ses- 
films as to interest value, immediately at the en 
Qion All croups were tested one week after the experimenpaj. 
presentations? with the exception of three classes ^ich were 
shown the second psychology series. Tnese classes were tested 
approximately two weeks later. 

. m cSTo? S S5S5 yc T~tf Se 

ssrsrffi.'^sassss't. «* 

tion P of the films to the subjects' knowledge, irrespective of 
the method of ^sentation. ^This use of the control groun has 
two important advantages (in factual learning exp i 
over the use of pre-tests to determine initial status witn re 
s^ect to the experimental content: it ^oids sensitizing the 

rev»,vvvr, a - r\ 11 f a f»t -mil z ” items, and* as a corollary, it tenas to 
maximize inter-group differences. Particularly with tests of 
1^ reliability! the "after-only" procedure tends to he the 

more sensitive. 



II. THE PSYCHOLOGY CLASSES’ EXPERIMENT: DESIGN AND PROCEDURES 



The first experiment was conducted with 11 classes of intro- 
ductory psychology students as subjects. S even c lasses served 
as experimental groups. These classes were shown two four-reel 
series of motion pictures. The remaining four classes wer e— used 
^s^^ntrol_groups . These classes took the tests on the films, 
but did not see “the films. Three presentation methods were em- 
ployed: a one-part method, in which all four reels were shown 

during a single class period; a two-part method, in which the 
first two reels were shown in one period, and the second two in 
a succeeding class period; and a four-part method, in which one 
reel was shown in each of four periods. 

In this chapter, the films and tests, the scheduling, the 
specific experimental procedures, and the subjects will be de- 
scribed. In the following chapter the statistical results ob- 
tained from this experiment will be presented. 



Films and Tests 



Film criteria . The following criteria were established for 
the film material to be used in the study: 

t 

1. The films should have general technical adequacy, in 
terms of photography, coverage of teaching content, clarity of 
presentation and so forth. 

2. The units of any one series should be produced with 
sufficient standardSsation of treatment to permit combination and 
smooth transitions. 

3. The films should be appropriate for the experimental 
populations involved. 

4. * Duplication of material in the units comprising a 
series should be held to a minimum. 

5. If possible, the units of a series should not have a 
necessary sequence, to permit the study of order effects. 

6. The series should consist of four units of approximate- 
ly equal length, and the four together should run about an hour. 

7* The film material should be new to the learners, if 
possible. 

8. The films should be non-dramatic, factual presentations. 



- 16 - 



When specific films were examined with these criteria 
in mind, several points became evident. First, in order 
that the second criterion, that of homogeneous treatment, 
be met, it became clear that an already existing series 
would have to be employed. It was not possible to combine 
independently produced units into a series, and at the same 
time to avoid duplication and to realize smooth transitions. 
Howeyer, as a result of this conclusion, it was necessary to 
drop the fifth criterion, relating to effects of order, since 
existing series almost invariably had a definite sequence. 

This limitation was not too serious, because it soon became 
apparent that, even if order could be varied, the large number 
of groups necessary to vary order of films within a series 
would no l be available. 

When it was indicated that the introductory psychology 
classes at the College would be made available, the film 
catalogues, particularly that of the Psychological Cinema 
Register (j&), were searched and a large number of films were 
screened. Several series of films were considered, only to 
be rejected because they were judged too difficult, too short, 
too long, or otherwise inappropriate. The two series finally 
selected seemed to meet all the criteria to a satisfactory 
degree. These two series each consisted of silent black and 
white, 16 millimeter films. When run at normal speed (16 
fra m es per second) they took more time than could be allowed. 
However, viewing tests revealed that they could be run at 
sound speed (24 frames per second) without loss of visual 
quality or too hasty presentation of the explanatory titles. 
Accordingly, in the experiment the films were shown at sound 
speed. Thus, each film was shown 50 per cent faster than the 
producer intended, and as a result it was shown in two-thirds 
of the time usually required. 

The films used. The film series used were (l) Dr. W. N. 
Kellogg’s The Ape and Child Series , and (2) Dr. Jules H. 
Masserman’s The Dynamics of an Experimental Neurosis Series . 

A brief outline of the film content, based largely on the 
Psychological cinema register catalogue descriptions, follows 

PCR -80-83: The Ape and Child Series . 

PCR-80: Some Behavior Characte ristics of a Human 

and a Chimpanzee Infant~inl;he Same Environment 
(running time, 14.8 minutes )~ The general behavior 
of a normal human infant between the ages of 10 and 
14J months is compared step by step with analogous 
behavior of his chimpanzee companion between the 
ages of 7i and 12 months. Six phases of behavioral 
. development are illustrated, as well as the early 
effects of human environment upon the ape and some 
basic differences between the ape and the child. 
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PCR-81: Comparative Tests on a Human and a Chimpanzee 

Trifant of Approximately the Same Age (running time, 14.5 
mlnute sTT the reactions of tHThuman infant are com- 
pared with the responses of the chimpanzee to a series 
of psychological tests. The tests include: handedness, 

startle reaction time, delayed reaction, cap-on-head 
tests, rotation tests, and others. 

PCR-82: Experiments U pon a Human and <1 Chimpanzee 

Infant After Six' lionths in the Same Environment (run- 
ning time, '14.5 minutes). Some of the more ^mplex 
tests solved by the human infant, age 16 to 19 months, 
and the chimpanzee, age 13i to 16 months, are demon- 
strated. Five tests are presented, involving simple 
perceptual -motor tasks . 

PCR-83: Some Ge neral Reactions of a Human and a 

C hinroanz e 6~Inf ar. tf ~Aft e r Six Months in Enviro n - 

ment l running time, 14.4 minutes).” Incidental or 
non-experimental behavior of the human irf an c and the 
chimpanzee are compared. Wine types of comparisons 
are made, including those involving upright walking, 
reaction to colored picture book, differences in climb- 
ing ability, eating with a spoon and drinking from a 
illss, beginning of cooperative play, pointing to parts 
of the body, imitation of "writing, and affectionate 
behavior toward each other. 

PCR-58-61: The Dynamics of an Experim ental Neurosis: Its_ 

Development and Techniques for its Alleviation. 

PCR-^8: Conditioned Fe eding Behavior and Induction of 

Experimental Neurosis in CatsTFunning time, ^ 15 _ 
minutes > . cats are trained to respond to a light or 
bell signal by going to a food box into which 
pellets are automatically released, and to obtain the 
food. An air blast blown just as the cat obtains th 
food is then employed to induce a motivational con- 
flict. This induces inhibition of the feeding and a 
variety of "neurotic” patterns in and out of the ex- 
perimental situation. 

PCR-59: Effects of Environmental Frustrations end- 

inte nsification of Conflict in Neurotic gets (run- 
ning time, 12 minutes}^ Wrious types of environ- 
mental frustration are contrasted with those 
duced by the experimental motivational conflict. 

PCR-60: E xperimental Diminution of Neur o t ^ B ehavio r w 

in Cats (running time, 15 minutesJT Four therapeu ^ 
techniques are demonstrated: (1> diminution of on^ e of 

the conflicting drives by manual or forced feeding out- 
side the cage; (2) retraining in the problem situation; 
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petting, gentle hand-feeding, "reassurance; (j) en- 
vironmental press - a maximally reinforced hunger 
drive and a movable, barrier which slowly forces the 
animal closer to the food, resulting in a break- 
through of the cat's inhibitions and hurried gulping 
of the food; and (4) "social" example, set by a 
normal cat who has learned to feed at the 
the neurotic cat gradually joins in the food- taking 
behavior . 

PCR-61: Active Part icipation in Establishing More 

Satisfactory Adj ustment (running time, 15_minutes>. 
Normal cats are trained to depress a small disk 
platform which serves as a switch to activate their 
feeding signal. When the switch is turned off, or 
a barrier is placed between the cat and the food, 
the animals show various substitute responses; when 
the switch again works the signals, or the barriers 
are removed, the animals resume the normal feeding 
pattern. When these animals are shocked or given 
an air-blast at the moment of feeding, they develop 
all the neurotic behavior manifested by the cats in 
the previous reels , although generally in milder 
form. Most of the neurotic switch-workers, although 
at first avoiding the switch, gradually reexplore 
its use until they reestablish the self-signaling 
and feeding pattern, despite repetitions of the air 

blast. 



These films were unfamiliar to almost all the students 
who participated in the experiment. In the case of the cat 
Neurosis Series one student in each of two classes indicated 
that he had seen the four reels before; in the case of the 
Ape and Child Series between two per cent and six per cent 
of the“students. in each class had seen one or more of the 

reels . 

The tests . Objective tests employing fou £“ ch 5 lcJ ® 2^? S ~ 
tions were constructed for each film series. The test for 
the Ape and Child Series included 78 items, 20 for each of 
the fir sFTwo~ree ls _ and - 19 for each of the last two. The 
test for the Cat Neurosis Series included 80 items, 20 for 
each - reel.-** 



1 The tests and rating forms for all the films will be 
found in the microfilms for the dissertation. Appendix B. 
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Five scores were obtained for each test; number right bn 
each of the four subtests, and total number right. All subjects 
were allowed and encouraged to answer all the items, and were 
instructed to guess when necessary. A check showed that less 
than one per cent of the items were omitted. 

The reliabilities of these tes^s were estimated by the 
Kuder-Richardson method of rational equivalence (16). Use was 
made of the Kuder-Richardson formula number 20. ^ The reliability 
of the Ape and Child Series test, based on the scores for all 
the subjects in r .the experimental group, was .51. The reliability 
of the Cat Neurosis Series test was .73* 

The interest rating form . An Interest Rating Form was 
devised to obtain the following data: 

1. A roster of those who attended each film session; 

2. An indication of the proportion of those who had 
already seen any of the films; 

3* An indication of interest in the subject matter 
of the films; and 

4. An indication as to whether .or not the sessions 
were judged too short or too long , and as to whether or not there 
was any constant trend of interest (for the spaced methods groups 

Weights from zero to two or four (depending upon the number 
of choices permitted by the question) were assigned to the re- 
sponses for each question except the first, which asked the sub- 
ject whether he had seen the films before. The zero was assigned 
to the most negative response. The rating score was the sum of 
the weights for the responses for the seven questions scored. 

This score had a possible range from zero to 18. In addition, 
an analysis was made of the distribution of responses to each 
question separately. 



Scheduling 



The psychology classes used in this experiment met three 
times a week, on alternate days, for 50-minute periods. If the 
second period was held in the afternoon (on a Wednesday or 
Thursday), the first and third periods were held in the morning 
(on a Monday and Friday, or on a Tuesday and Saturday). If 
the second period was held in the morning, the first and third 
were held in the afternoon. 



The reliability coefficient r= 



n-1 



-£m 



where n 



0 n s? * — 

is the number of items, sf is the variance of the test scores, 
and j? pq is the sum of the item variances. 



This class schedule made it impossible to conform exactly 
to the plan of having the same interval between the parts (in 
the four-part distributed presentation), since a weekend had 
to intervene. Xt also precluded the possibility of placing the 
test date one week from the mid-point of the series. Further- 
more, during the course of the experiment it was found necessary 
to change a few dates in order to meet unexpected situations. 

The principal change involved delaying, for one class in each 
methods group for the Cat Neurosis Series , the test until ap- 
proximately two weeks after the experimental showings • 

It is believed that the "one-week” tests were administered 
far enough out on the retention curve to reduce to negligible 
proportions the effect of differences in the actual length of 
what was nominally a week, and that the same situation obtained 
with respect to the "two-week" retention tests. 

Table 2 presents the schedule followed for the Ape and Child 
Series, and Table 3 presents the schedule for the Cat Neurosis ^ 
Series. The time of day for each period is omitted^ it has been 
pointed out that the periods for each class were staggered. The 
specific dates are also omitted. The Ape and Child Series was 
shown during February and March 19^8 , the Cat Neurosis Series 
was shown during the month of April 19^8 to the same classes. 



Procedures Followed 



For the purposes of this experiment the films were used 
not as teaching aids but as total teaching instruments . The 
films were not described in any detail before presentation, 
they were not discussed in the classes during the course of the 
experiment, and they were not explicitly related to the rest 
of the content of the psychology course. This somewhat un- 
usual procedure was followed in order to assure a Hfiaxiroum degree 
of uniformity with respect to the content presented in the films . 

To achieve this uniformity, each film series was presented 
to each experimental class with a standard introduction which 
very briefly identified the series and explained in general the 
purpose of the study. The specific objective - that of compar- 
ing massed with spaced presentation - was not mentioned, so as 
not to prejudice rating of film length. The introduction was 
read at only the first session of the spaced presentations. 

The massed presentation required a whole class period $ the 
spaced presentations required only part of a period. However, 
in everv case the first part of a spaced presentation was given 
at the beginning of th~ class period. In a few cases the suc- 
ceeding parts were shown at the end of the period. 
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At the appointed time the experimenter, or an assistant, 
and a projectionist came to the scheduled class . Rating forms 
were distributed, and, for the first session, the introductory 
statement was presented verbatim ♦ 

In the second and succeeding sessions for the spaced pre- 
sentation methods groups, an announcement was made merely to 
the effect that the second (or third, or fourth) film was to 
be shown. 



At the end of each session, the rating forms were completed 
and collected by the experimenter. Thus, the one-pax-t groups 
made one rating, on the entire series 3 the two-part groups made 
two ratings, one on each of the two pairs of two reels 3 and the 
four-part groups made four ratings, one on each of the four 
reels. 

Projection facilities (screens, projectors, and an operator) 
were provided by the Audio=Visual Aids Library at the College . 

For the one-part and two-part showings two projectors were used 
to obviate the necessity of a time lag in setting up the second 
and succeeding reels. 

On the whole, projection conditions were excellent, as 
most of the classrooms used were equipped with blackout curtains 
and wall screens. 

Test scoring procedures. Test responses were recorded on 
IBM answer sheets which were later machine- scored. 



The Experimental Population 



The eleven classes of students taking the first (introduc- 
tory) undergraduate course in psychology at the College which 
participated in the experiment included a total initial popula- 
tion of 545 students. However, after those subjects who missed 
one or more of the reels of either one series or the other, or 
failed to take the test for the series for which they saw all 
the reels, were eliminated, a sample of 460 was left for which 
complete data were available for the Ape and Child Series , and 
a sample of 410 was left for which complete data were available 
for the Cat Neurosis Series . Only 370 subjects provided com- 
plete data for both film series. 

The principal analysis was conducted for each film series 
separately on the sample (of 460 or 410 subjects) for which 
complete data for that series were available. The sample of -j 
370 subjects was used only to calculate the correlation between 
performance on the two film series tests . 



Two indices were employed to determine whether the 
classes were equivalent in initial ability. These were the 
all-college grade-point average and the final course grade. 

The all - college grade -point average . This is a weighted 
average of the grades earned by the student for his college 
work to date. The grades given at the College are 3 (high- 
est). "2,” "l," ”0," "-1," and ”-2.” A grade of -1 or 
M -2, is a failure. 

Although some workers, such as Borow (2), have contended 
that the college grade^point average is a rather unsatisfac- 
tory measure of achievement or ability, it has been used in 
a very large number of studies as the principal, if not sole, 
criterion of college achievement, and has served to validate 
college entrance and college aptitude tests. Use has been 
made of it in this way at The Pennsylvania State College by 
Borow (2), Coblentz (7.)* Castore (5/* Roulette (2°), Schultz 
(2£), Whittaker (36), and Mertens (24), among others. 

Furthermore, several studies indicate a high degree of 
reliability in the sense of stability from year to year. 

Weaver (35) reported a reliability for the grade-point aver- 
age over a four-year period of .88. Castore (j5) found a 
correlation of .82 between the first semester grade -point 
average and the final grade-point average for all curricula 
at The Pennsylvania State College. Mertens (24) reported a 
correlation of .90 between third semester grade-point average 
and grade -point average to date for sophomore women in the 
education curriculum, and a correlation of .87 for sophomore 
women in the liberal arts curriculum. 

This index may therefore be accepted as a relatively 
stable measure of general academic achievement at the College. 

Final psychology grade. This variable cannot be con- 
sidered a stable index of achievement, due to probable dif- 
ferences in instructor rating practices and requirements. 
However, it generally correlated to a greater extent with 
the film test scores than the all-college grade-point average, 
and was therefore retained as a matching variable. 

Table 4 presents the means and standard deviations of 
the all-college grade-point averages and final:- psychology 
grades for the subjects for whom complete data were available 
for the Ape and Child Series ; Table 5 presents the same infor- 
mation for the population for the Cat Neurosis Series . The 
following observations seem pertinent. First, the two indices 
are moderately correlated with each other (r = .5). Second, 
the all-college grade-point average is somewhat less variable 
than the final psychology grade in both cases: for the Ape 

and Child Series the grade-point average means for the classes 
ranged from 1.186 to 1.551, while the final psychology grade 
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TABLE h 



MEANS, STANDARD DEVIATIONS, AND CORRELATIONS FOR ALL-COLLEGE GRADE-POINT 1 
AVERAGJ; (GPA) AND FINAL PSYCHOLOGY GRADE (PG), FOR SUBJECTS SEEING THE 
APE AND CHILD SERIES, GROUPED BY CLASS AND METHOD OF PRESENTATION 



Method 


Class 


n 


GPA 

Mean 


SD 


PG 

Mean 


SD 


r 


Control 


k 


ho 


1.1*31 


.695 


1.175 


1.022 


.616 




9 




1.368 


o 606 


1.186 


1.105 


.357 




10 


ho 


1*328 


.637 


1.225 


.790 


•6 o 5 




11 


h 2 


1.1*67 


.608 


1.238 


.895 


.557 


Total 




16 ^ 


1.399 


.639 


1.206 


.963 


.519 


1-Part 


3 


29 


1.337 


.692 


1.283 


.969 


.572 




$ 


38 


1.3U7 


.583 


1.158 


.901* 


.51*8 




8 


hh 


1.390 




1.591 


1.007 


.61*3 


Total 




111 


1.361 


.600 


1.1*11* 


.982 


.581 


2-Part 


2 


h 1 


1.350 


o7it3 


1.31*1 


.873 


.805 




7 


60 


1.186 


.516 


1.500 


.806 


.285 


Total 




101 


1.253 


o 6 U 2 


1.1*36 


.837 


.511 


ll-Part 


1 


h 3 


1.551 


.689 


1.698 


.977 


.597 




6 


IltO 


1.1*23 


.606 


1.1*00 


.800 


.31*0 


Total 




83 


1.1*89 


.653 


1.551* 


.909 


.501 


All Classes 




U 60 


1.37U 


o63h 


1.370 


.91*1 


.523 



0 
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TABLE 5 



MEANS, STANDARD DEVIATIONS, AND CORRELATIONS FOR ALL-COLLEGE GRADE- 
POINT AVERAGE (GPA) AND FINAL PSYCHOLOGY GRADE (PG), FOR SUBJECTS 
SEEING THE CAT NEUROSIS SERIES, GROUPED BY CLASS AND METHOD OF PRE- 
SENTATION 



Method 


Class 


n 


GPA 

Mean SD 


PG 

Mean 


SD 


r 


Control 


h 


37 


1.1*82 


.692 


1.189 


.982 


.621* 




9 


la 


1.378 


.61*9 


1.366 


.901* 


.507 




10 


3k 


1.322 


.591 


1.176 


.706 


.1*39 




11 


U7 


1.388 


.627 


1.191 


.891 


.531 


Total 




159 


1.393 


.638 


1.233 


.881* 


.535 


1-Part 


1 


U7 


1.521* 


.700 


1.596 


.938 


.611 




2 


la 


1.33U 


.657 


1.293 


.862 


.61*1* 




3 


21 


1. 1*1*0 


.695 


1.521* 


1.006 


.570 


Total 




108 


1.1*37 


.689 


1.268 


.931* 


.620 


2-Part 


6 


37 


1.1*80 


.600 


1.205 


.787 


.386 




8 


32 


1.1*23 


.511* 


1.656 


1.01*9 


.61*2 


Total 




69 


1.1*53 


.563 


1.522 


.926 


.1*89 


U-Part 




23 


1.521 


.525 


1.31*8 


.698 


.271* 




7 


50 


1.235 


.523 


1.300 


.81*9 


.329 


Total 




73 


1.325 


•51*0 


1.521 


.813 


.261* 


All Classes 




lao 


1.1*03 


.625 


1.395 


.902 


.509 
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means ranged from I.158 to 1.6985 for the Cat Neurosis Series 
the means for the grade -point average ranged from 1.235 to 
1.524, while the means for the final psychology grade ranged 
from 1.3.76 to I.656. In both cases, the intra-class vari- 
abilities are greater for the final psychology grades than 
for the all-college grade-point averages. Third, attrition 
of the sample (as between the Ape and Child Series population 
and the Cat Neurosis Series populat ion) resulted both in a 
constriction in the range of the means for both indices, and 
in a slight increase in the means. For example, the all-college 
grade-point average mean for the entire complete data group 
for the Ape and Child Series was 1.374 (Table 4), while the 
comparable mean for the Cat Neurosis Series vras 1.403* These 
observations are consistent with the general hypothesis that 
better students attend class more regularly. It is believed, 
however, that while there may have been a slight amount of 
self-selective sampling, it was not sufficient to disturb the 
experiment seriously. 

An analysis of variance was made for these two indices, 
for each film series separately. 3 There is no evidence that 
either the classes or methods groups are heterogeneous with 
respect to all-college grade -point average . The classes with- 
in methods are not significantly heterogeneous with respect 
to the final psychology grade , and for both samples there is 
but slight evidence of “heterogeneity (F-ratio significant at 
the five per cent level of confidence) among the methods groups 
with respect to this variable. The conclusion may be drawn 
that the; groups are essentially comparable in initial status. 



J All the analysis of variance and covariance tables, show- 
ing sums of squares and crossproducts, and mean squares, have 
been omitted from this report. The tables are available on 
microfilm. 





III. THE PSYCHOLOGY CLASSES* EXPERIMENT: RESULTS 



The principal analysis for this experiment was based on 
the total test scores for each series. The means of the classes 
and the methods groups were compared by an analysis of variance 
and, for each film series, the effect of initial status 1 on 
test performance was determined by a covariance analysis that 
took into account final psychology grade and grade-point average. 

An analysis was then made of the comparability of the 
methods as reflected in the ratings. 

Finally, using only the data from those subjects who had 
seen both series and taken both tests, the correlation between 
achievement on the two tests was calculated. 

The detailed results will be reported for each film series 
separately. Subject to minor variations, however, these results 
may be summarized for both film series as follows: 

1. The experimental classes made very appreciable "gains” 
in comparison with the control classes who did not see the films. 
In other words, learning took place. 

2. The differences among the three methods of presentation, 
as measured by the total test score, are consistently in favor 
of the spaced methods of presentation as opposed to the single 
massed session. However, these differences are negligible and 
insignificant, whether the test is administered at the end of 
one week or at the end of two weeks . 

3. This finding also applies in general to the subtest 
scores . 

4 . There is a slight but significant relationship between 
test performance and initial status, but the lack of significant 
differences among the three methods cannot be attributed to 
differences in the initial status of the groups. 

5. Analysis of the rating scores provides no reliable or 
consistent indication of discrimination among the three methods 
of presentation for the Ape and Child Series , and the ratings 
are independent of test performance . However, a small propor- 
tion of the subjects in the massed-method group reported that 
the film series was "too long." The ratings for the Cat Neurosis 
Series suggest the possibility of differentiating among the 
meTHoUs. In the massed presentation group between 60 and 80 

per cent of the subjects rated the session "too long." 



1 While "final psychology grade" is not properly a measure of 
"initial status" it is sufficiently close to being so to per- 
mit use of the term "initial status" to denote both matching 
variables . 
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In short, although superficial examination of the raw scores 
might suggest some slight evidence favoring the spaced method of 
presentation, in the analysis of the data this difference fails 
to be statistically significant. The results from the psychology 
experiment indicate that, within the limits of sampling error, 
learning will be approx imately as efficient if the learner is 
presented with a body of material in a single film session last- 
ing one hour, as if he is presented with the same material in 
four 15- minute sessions spaced approximately equally over a week . 



The Ape and Child Series 



The basic data for the Ape and Child Series test - means, 
standard deviations, and standard errors for the total test score 
and the subtest scores - are presented in Table 6. The mean 
score for the massed (one-part) method group represents a gain 
of 22.75 points over the control group mean, while the mean scores 
for the two spaced methods groups represent gains of 23*51 and 
23. hk points, respectively, over the control group. The films 
effected a significant amount of learning as measured by this 
test. Furthermore, while there is a suggestion of a consistent 
difference in favor of the spaced methods , this consistency Is 
not sustained when the means for the classes are compared. Thus, 
while Classes 3 and 5 have lower (total) means than any other 
class, the mean of Class 8 exceeds the mean of one class in each 
of the two spaced methods groups. Furthermore, all three classes 
in the massed method group have higher means than any of the 
other classes on the second subtest (Part 2). 

i 

Comparison of the presentation methods . An analysis of 
variance was made for the total test score and for the subtest 
scores to determine whether any of the differences exceeded 
chance expectation. The results are reported in Table 7* 

It may be noted that the inter-class and inter-methods dif- 
ferences among the experimental groups on the total test score 
are statistically insignficant . The F-ratio for the methods is 
less than 1.3 

The same thing is true for the first and third subtests. 

A slight lack of homogeneity is indicated for the second and 
fourth subtests. However, review of the means (Table 6) shows 
that for the second subtest the difference is in favor of the 



See footnote 3, Page 27* 

“5 

This may be interpreted as meaning that the variation between 
methods is less than the variation within the methods. 
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Total 295 51io76 ho62 .21 lU*71 2*96 *17 l4o05 2 e k6 dU 11*92 2*l5 -12 

Experimental Group 



massed presentation. For the fourth subtest it is in favor 
of the two spaced methods. These differences are, in both 
cases, of the order of a score point, and cannot be consider- 
ed of practical importance. 

In every case, the difference between the experimental 
and control group is significant beyond the 0.1 per cent 
level of confidence. 

The variance estimates for the total score were adjusted 
by means of a covariance analysis for ” initial status” as 
measured by grade -point average to date and final psychology 
grade. The multiple correlation between these variables con- 
sidered jointly and the total test score for the entire ex- 
perimental group was .29* 

Since the regression accounts for only about eight per 
cent of the total test score variance, it would require ex- 
ceptionally large differences in initial status to change 
the relative standing of the groups with respect to the film 
test performance. 

When the variance estimates for the total test score 
were adjusted for this multiple correlation, for both the 
experimental classes and methods groups there was a slight 
reduction in the estimate of error, and a slight Increase 
in the mean square attributable to the "between groups. 
However, the F-ratios remain well below the five per cent 
level of significance. In short, the lack of differences 
among the experimental methods cannot be considered an 
artifact of initial differences among the groups, at least 
with respect to these two matching variables. 4 

The interest ratings . The Ape and Child Series is one 
of the most popular with students in psychology classes at 
The Pennsylvania State College. Kellogg* s subjects, the 
child Donald and the chimpanzee Gua, are sprightly and 
charming, and the content of the films is readily grasped 
(if not completely remembered) by almost all students. 

Analysis of the interest ratings merely adds statistical 
evidence to these observations. The highest possible inter- 
est score was 18 points. The mean for no session was lower 
than 13.7, and the ratings were very markedly massed at the 
high end of the scale. No subject had a rating score of less 
than 7* 



4 



See footnote 



3, 



Page 27* 
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TABLE 8 



MEAN INTEREST. RATINGS, AND CORRELATIONS BETWEEN RATINGS 
AND TEST SCORES: APE AND CHILD SERIES 



Method 


Film Period 


Class 


n 


Mean 

Rating 


SD 


r 


1-Part 


First (Only) 


3 


29 


15.52 


1.25 


. 02 E 




$ 


38 


15.26 


1 . 1*6 


.019 






8 


EE 


111 . 82 


2.52 


— o 028 






Total 


111 


15.15 


1.91* 


.oo5 : 


2-Part 


First 


2 


El 


U*.lil 


1.23 


*029 






7 


60 


15.1*3 


1.79 


-dE 6 






Total 


101 


15.16 


1.59 


-olOl 




Second 


2 


El 


lk. 76 


1.22 


o071 






7 


60 


15.1*3 


1.79 


- 0 IE 6 






Total 


101 


15.16 


1.59 


-.101 


E-Part 


First 


1 


E3 


il*.35 


1.31 


-.003 






6 


Eo 


ll*.l*7 


1.81* 


-.157 






Total 


83 


ll*.l*l 


1.59 


-.071 




Second 


1 


E3 


13.91 


1.07 


-.026 






~ 6 


Eo 


i!*.55 


1.56 


-.031 






Total 


83 


ll *.22 


1.37 


-.071 




Third 


1 


E3 


13.69 


1.30 


*13E 






6 


Eo 


il*.5o 


1 . 1*0 


.135 






Total 


83 


11 *. 08 


1 . 1*1 


.190 




Fourth 


1 


E3 


13.91 


1.27 


-.151 






6 


Eo 


1U.92 


1.38 


.136 






Total 


83 


il*.l*o 


1 . 1*2 


- 0 OO 6 




o 

ERIC 



F 






l 



> 



Since a rating was made at the end of each session, £he 
massed method classes made one rating, based ^ all four reeis . 
The two-part method classes made two ratings, the first on 
reels one and two, the second on reels three and . 

four-part method classes made four ratings, one after each 

reel. 

The Initial hypothesis was that the mean rating of the 
mass ed^pr e sent at ion P group would he lower than that for any 
other session, on the grounds that exposure to an *°?? ve 
film would bring more than s at ia t i on and would L e m ore 
responses to such questions as Would you . lj 

SSTln this series?” or "Did this film hold ^ attention? 
With respect to the spaced methods groups, it was thought tha 
rating means would toe higher in the first session than 
later sessions. 

This hypothesis was not sustained. Table 8 
the mean rating scores toy session, class, methods group, 
and the correlations between the ratings the test scores 
for the sutotest or tests for the reels on which the rating 

was based. 

Although some of the inter-sessions differences may be 
significant, they were considered too small to merit stati 
tical analysis 3 : The only observation with respect to these 

mean scores which seems of any importance that the means 
for the classes participating in the one-part (massed) p 
sentation were slightly higher than the means for almost all 

the other classes. 

None of the correlations between the ratings and the 
test scores differ significantly (at the fiiepercentlevel 
of confidence) from zero. In other words, .es* , 

was essentially independent of attitudetowardthefilmserie 
or the presentation method, as reflected in theoe pacings. 

The considerable skewing and restriction in g 

scores may have contributed to reducing the correlations. 
However, as will be pointed out below, even J* os ®. ® e ^® s 
for which the ratings were more widely distributed, the co 
relations betwten test performance and rating ocore were oi 
about • the same order as reported here . 

An analysis was also made of the ^the^ating 1 

of responses to the eight questions included on the rating 

form. This analysis revealed the popular! 3T in a single 

series to no small extent. Except for onestudent in a sing 
session in each of the classes eposed to the four-part pre 
sentation method, everyone reported that the films he 

interest most or all of the time. For « LiSritv ^as 

tout two, the film or films shown were rated toy a majority a 

"very good” or "excellent • " 



- 35 - 



o 



r 



In point of fact, of the eight questions asked, only one, 
that directly relating to film length, provides any consistent 
discrimination among the methods groups. No one in the two 
spaced methods groups thought the sessions (lasting either 15 
or 30 minutes) too long, while 3*4 per cent, 4.5 per cent, and 
10.5 per cent, respectively, of the three classes exposed to 
the massed presentation reported that they thought the film 
was too long. On the other hand, between 10.5 per cent and 
24.2 per cent of the massed presentation classes thought the 
session too short. 



While a small proportion of the subjects thought that 
the hour session was too long, this length has not been demon- 
strated to have had a deleterious effect either on reported 
interest or measured learning. 



The Gat Neurosis Series 



The general conclusions to be drawn from the analysis 
of the test scores and rating scores for the Cat Neurosis 
Series of films are substantially the same as those reported 
above for the Ape and Child Series . The basic test data for 
the Cat Neurosis Series are presented in Table 9* It will be 
remembered that, for this series, one class in each methods 
group took the test two weeks after the experimental showings, 
while the remaining classes took the test one week after the 
showings. This difference had a significant effect on test 
performance. The mean total score of the subjects in the one- 
week retention group represents a gain of 16.68 points over 
the mean of the control group, while the mean score for the 
two-week retention group represents a gain of 13*65 points. 
Furthermore, the mean total score for each method in the one- 
week group is higher than the mean score for any method in 
the two-week retention group. In other words, some forget- 
ting took place. 

For both the total test score means and the subtest 
score means within each retention group there is a consistent 
trend favoring the spaced presentation methods. These dif- 
ferences, however, are consistently small. 

Comparison of the presentation met hods . An analysis of 
variance wa^”made“for“both the total scores and the subtest 
scores to determine whether (1) the inter-methods differences 
were significant, (2) the inter-retention period differences 
1 were significant, (3) there was any inter-action between the 
retention period and the presentation method. Significant 
interaction could be interpreted to mean that the relative 
effectiveness of the methods, as measured, depended in part 
upon the length of time that elapsed between the experimental 
sessions and the test. For example, one might find that 
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